Partition problem

In computer science, the partition problem is an NP-complete problem. The problem is to decide whether a given multiset of integers can be partitioned into two "halves" that have the same sum. More precisely, given a multiset S of integers, is there a way to partition S into two subsets S1 and S2 such that the sum of the numbers in S1 equals the sum of the numbers in S2? The subsets S1 and S2 must form a partition in the sense that they are disjoint and they cover S. The optimization version asks for the "best" partition, and can be stated as: Find a partition into two subsets S_1, S_2 such that \max(\operatorname{sum}(S_1), \operatorname{sum}(S_2)) (or, equivalently, the difference between  S_1 and  S_2 ) is minimized (sometimes with the additional constraint that the sizes of the two sets in the partition must be equal, or differ by at most 1).

The partition problem is equivalent to the following special case of the subset sum problem: given a set S of integers, is there a subset S1 of S that sums to exactly t /2 where t is the sum of all elements of S? (The equivalence can be seen by defining S2 to be the difference S − S1.) Therefore, the pseudo-polynomial time dynamic programming solution to subset sum applies to the partition problem as well.

A variation of the partition problem is the 3-partition problem, in which the set S must be partitioned into |S|/3 triples each with the same sum. In contrast to partition, 3-partition has no pseudo-polynomial time algorithm unless P = NP: 3-partition remains NP-complete even when using unary coding.

Contents

Methods

Although the partition problem is NP-complete, there are heuristics that solve the problem in many instances, either optimally or approximately. For this reason, it has been called "The Easiest Hard Problem" by Brian Hayes.[1]

The pseudo-polynomial time dynamic programming solution for the subset sum problem applies to the partition problem as well, and gives an exact answer in polynomial time when the size of the given integers is bounded. In general, however, the numbers in the input may be exponential in the input size, and this approach may not be feasible.

One approach to the problem, imitating the way children choose teams for a game, is the greedy algorithm, which goes through the numbers in descending order, assigning each of them to whichever subset has the smaller sum. This works well when the numbers in the set are of about the same size as its cardinality or less. Another heuristic, due to Narendra Karmarkar and Richard Karp,[2] is the differencing algorithm, which at each step removes two numbers from the set and replaces them by their difference. This represents the decision to put the two numbers in different sets, without immediately deciding which one is in which set. The differencing heuristic performs better than the greedy one, but is still bad for instances where the numbers are exponential in the size of the set.[1]

Both heuristics have a running time of O(n^2) or less. The greedy algorithm is known to give a 4/3-approximation to the optimal solution of the optimization version. (In other words, if the greedy algorithm gives two sets S_1, S_2, then \max(\operatorname{sum}(S_1), \operatorname{sum}(S_2)) \le 4/3\mathrm{OPT}.) More generally, we can consider a version that takes the K largest elements, and for each partition of them, extends the partition by adding the remaining elements successively to whichever set is smaller. (The greedy algorithm corresponds to K=2.) This version runs in time O(2^K n^2) and is known to give a (K%2B2)/(K%2B1) approximation; thus we have a polynomial-time approximation scheme (PTAS) for the number partition problem, though this is not an FPTAS (the running time is exponential in the desired approximation guarantee). However, there are variations of this idea that are fully polynomial-time approximation schemes for the subset-sum problem, and hence for the partition problem as well.[3][4] On the other hand, the problem of minimizing the square of the difference \left(\operatorname{sum}(S_1)-\operatorname{sum}(S_2)\right)^2 has no FPTAS unless P=NP.[5]

There are also anytime algorithms, based on the differencing heuristic, that first find the solution returned by the differencing heuristic, then find progressively better solutions as time allows (possibly requiring exponential time to reach optimality, for the worst instances).[6]

Hard instances

Sets with 1 or no partitions tend to be hardest (or most expensive) to solve compared to their input sizes. When the values are small compared to the size of the set, perfect partitions are more likely. The problem is known to undergo a "phase transition"; being likely for some sets and unlikely for others. If m is the number of bits needed to express any number in the set and n is the size of the set then m/n < 1 tends to have many solutions and m/n > 1 tends to have few or no solutions. As n and m get larger, the probability of a perfect partition goes to 1 or 0 respectively. This was originally argued using methods from physics by Stephan Mertens,[7] and later proved by Borgs, Chayes, and Pittel.[8]

See also

Notes

  1. ^ a b Hayes 2002
  2. ^ Karmarkar & Karp 1982
  3. ^ Hans Kellerer; Ulrich Pferschy; David Pisinger (2004), Knapsack problems, Springer, p. 97, ISBN 9783540402862, http://books.google.com/?id=u5DB7gck08YC&pg=PA97 
  4. ^ Martello, Silvano; Toth, Paolo (1990). "4 Subset-sum problem". Knapsack problems: Algorithms and computer interpretations. Wiley-Interscience. pp. 105–136. ISBN 0-471-92420-2. MR1086874. 
  5. ^ Woeginger, Gerhard J. (2000-01-01), "When Does a Dynamic Programming Formulation Guarantee the Existence of a Fully Polynomial Time Approximation Scheme (FPTAS)?", Informs Journal on Computing 12 (1): 57–74, doi:10.1287/ijoc.12.1.57.11901, http://joc.journal.informs.org/cgi/content/abstract/12/1/57, retrieved 2009-10-05 
  6. ^ Korf 1998, Mertens 1999
  7. ^ Mertens 1998, Mertens 2001
  8. ^ Borgs, Chayes & Pittel 2001

References